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Abstract 

We study a Markov process with two components: the first component evolves accord- 
ing to one of finitely many underlying Markovian dynamics, with a choice of dynam- 
ics that changes at the jump times of the second component. The second component 
is discrete and its jump rates may depend on the position of the whole process. Under 
regularity assumptions on the jump rates and Wasserstein contraction conditions for 
the underlying dynamics, we provide a concrete criterion for the convergence to equi- 
librium in terms of Wasserstein distance. The proof is based on a coupling argument 
and a weak form of the Harris Theorem. In particular, we obtain exponential ergodicity 
in situations which do not verify any hypoellipticity assumption, but are not uniformly 
contracting either. We also obtain a bound in total variation distance under a suitable 
regularising assumption. Some examples are given to illustrate our result, including a 
class of piecewise deterministic Markov processes. 
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1 Introduction 

Markov processes with switching are intensively used for modelling purposes in ap- 
plied subjects like biology HCMMS 1 2IICDMR12[|FGMT2| . storage modelling BEKKPOSH . 
neuronal activity IIPTW12I IGTl II . This class of Markov processes is reminiscent of 
the so-called iterated random functions IIDF99I or branching processes in random en- 
vironment |Smi681 in the discrete time setting. Several recent works IIBH12I IBGMIOI 
IBLMZ12bl iBLMZ12cl ICD08I IdSYOSI IGIY041 IGG96II deal with their long time be- 
haviour (existence of an invariant probability measure, Harris recurrence, exponential 
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ergodicity, hypoellipticity...)- In particular, in IIBH12I lBLMZ12bll . the authors pro- 
vide a kind of hypoellipticity criterion with Hormander-like bracket conditions. Under 
these conditions, they deduce the uniqueness and absolute continuity of the invariant 
measure, provided that a suitable tightness condition is satisfied. They also obtain geo- 
metric convergence in the total variation distance. Nevertheless, there are many simple 
processes with switching which do not verify any hypoellipticity condition. To illus- 
trate this fact, let us consider the simple example of IIBLMZ12cl . Let {X, I) be the 
Markov process on x {—1,1} generated by 

Afix, i) - -(.T - (i, 0)) • V^fix, i) + {fix, -i) - fix, z)) . (1.1) 

This process is ergodic and the first marginal tt of its invariant measure is supported on 
K X {0}. Thus, in general, it does not converge in the total variation distance. However, 
it is proved in IBLMZ12cl that it converges in a certain Wasserstein distance. Let us 
recall that the Wasserstein distance on a Polish space iE, d) is defined by 

Wd(Mi,M2) = inf E[diXi,X2)], 

for every probability measure ^i, fi2 on E, where the infimum is taken over all pairs 
of random variables Xi, X2 with respective laws fii, fi2- The Kantorovich-Rubinstein 
duahty IIVil09l Theorem 5.10] shows that one also has 

Wd(/ii,M2)= sup / fd^ii- I fdfj.2, 

/GLipi Je J E 

where / : i? i-> M is in Lipj^ if and only if it is a 1-Lipschitz function, namely 

^x,yeE, \fix)^ fiy)\<dix,y). 

The total variation distance djy can be viewed as the Wasserstein distance associated 
to the trivial distance function, namely 

dTv(yUi,Ai2) = inf P {Xi ^ X2) = i sup / fd^ii - / /d^2, 

I ||/i|oo<l "'-E JE 

where the infimum is again taken over all random variables Xi, X2 with respective 
distributions fii, fi2- In the present article, we will give convergence criteria for a gen- 
eral class of switching Markov processes. These processes are built from the following 
ingredients: 

• a Polish space iE, d) and a finite set F; 

• a family iZ'-'^^)neF of i?-valued strong Markov processes represented by their 
semigroups iP^'^^)n^F, or equivalently by their generators (£'^"')„gi? with do- 
mains (I?(">)„gj.; 

• a family (a(-, i,j))i,je f of non-negative functions on E. 

We are interested by the process (Xf)t>o — iXt, It)t>o, defined on E = i? x i^, 
which jumps between these dynamics. Roughly speaking, Xt behaves like Z^^'^ as 
long as / does not jump. The process / is discrete and jumps at a rate given by a. More 
precisely, the dynamics of (Xt)t>o is as follows; 
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• Given a starting point (x, i) G E x F, ws take for Z^^^ an instance as above 
with initial condition Zq*' = x. The initial conditions for Z*^' with j ^ i are 
irrelevant. 

• The discrete component / is constant and equal to i until the time T = miujg p Tj, 
where (Tj)j>Q is a family of random variables that are conditionally independent 
given Z**' and that verify 

Vj G F, P (T, >t\Tt)^ cxp (-J^a {Z^:\ ds^ , 

where J^t = cr{ZW \ s<t}. 

• For all te[0, T), we then set Xt = ' and It = i. 

• At time T, there exists a unique j E F such that T = Tj and we set = j and 

• We take (Xt, It) as a new starting point at time T. 

Let us make a few remarks about this construction. First, this algorithm guarantees the 
existence of our process under the condition that there is no explosion in the switch- 
ing rate. In other words, our construction is global as long as / only switches value 
finitely many time in any finite time interval. Assumption 11.11 below will be sufficient 
to guarantee this non-explosion. Also note that, in general, X and / are not Markov 
processes by themselves, contrary to X. Nevertheless, we have that / is a Markov pro- 
cess if a does not depend on its first component. The construction given above shows 
that, provided that there is no explosion, the infinitesimal generator of X is given by 

L/(x, i) = 6'^ fix, i) + Y. - ' (1-2) 

for any bounded function / such that /(•,») belongs to 2?**' for every i E F. We will 
denote by {Pt)t>o the semigroup of X. To guarantee the existence of our process, we 
will consider the following natural assumption: 

Assumption 1.1 (Regularity of the jumps rates). The following boundedness condition 
is verified: 

a ~ sup sup a{x, < +oo, 
xeE leF .^p 

and the following Lipschitz condition is also verified: 

sup Y] \a{x,i,j)-aix,i,j)\ < Kd(x,y), 

for some k > 0. 

We will also assume the following hypothesis to guarantee the recurrence of /: 
Assumption 1.2 (Recurrence assumption). The matrix (a{i,j))ij^F defined by 

a{i,j) = inf a{x,i,j), 

yields the transition rates of an irreducible and recurrent Markov chain. 
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With these two assumptions, we are able to get exponential stability in two situa- 
tions. The first situation is one where each underlying dynamics does on average yield 
a contraction in some Wasserstein distance, but no regularising assumption is made. 
The second situation is the opposite, where we replace the contraction by a suitable 
regularising property. 

1.1 Two criteria without hypoellipticity assumption 

In this section, we assume that we have some information on the Lipschitz contraction 
(or expansion) of our underlying processes: 

Assumption 1.3 (Lipschitz contraction). For each i ^ F, there exists p(i) G M such 
that 

yt > 0, Wd{^lP^'\ i^Pl"^) < e-pW'Wd (m, 1^) , (1.3) 

for any two probability measures p, v. Furthermore there exist xq Cz E and tx,-, > 
such that ifVxo '■ ^ ^ '^(^' ^o) then 

sup PtVx„(xo) < +00. 

To verify equation ( 11.3b is not much of a restriction because we do not assume 
that p(i) > 0. The best constant in this inequality is called the Wasserstein curvature 
in IIJou07l IJou09l and the coarse Ricci curvature in IIO1109I lOlllOI . since it is heavily 
related to the geometry of the underlying space as illustrated in llvRSOSI Theorem 2]. 
If p{i) > 0, then we can deduce some properties like geometric ergodicity, a Poincare 
inequality or some concentration inequalities IIClol2l IJou07l IJou09l IHSVI lOillOII . A 
trivial bound on p(i) is given in the special case of diffusion processes in Section l4~n 

The bound (ll.3l l is quite stringent since, if p(i) > 0, it implies that there is some 
Wasserstein contraction for every t > and not just for sufficiently long times. This 
is essentially equivalent to the existence of a Markovian coupling between two in- 
stances Xt and Yt of the Markov process with generator such that Kd{Xt,Yt) < 
e-P*d(Xo,YQ). 

In principle, this condition could be slightly relaxed by the addition of a proportion- 
ality constant Ct, provided that one assumes that the switching rate of the process is 
sufficiently slow. This ensures that, most of the time, it spends a sufficiently long time 
in any one state for this proportionality constant not to play a large role. 

One could also imagine allowing for jumps of the component in E at the switching 
times, and this would lead to a similar difficulty. 

In the same way, the distance d appearing in Assumption ll.3l is the same for every 
i and that it does not allow for a constant prefactor in the right hand side of ( 11.3b . This 
may seem like a very strong assumption since usual convergence theorems, like Harris' 
theorem, do not give this kind of bound. We will see however in Section|5]an example 
which illustrates that there is no obvious way in general to weaken this condition. The 
intuitive reason why this is so is that if the process switches rapidly, then it is crucial 
to have some local information (small times) and not only global information (large 
times) on the behaviour of each underlying dynamics. 

We now have presented all the assumptions that are necessary to state our main 
results. The first one describes the simplest situation, that is when a does not depend 
on its first component: 
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Theorem 1.4 (Wasserstein exponential ergodicity in the constant case). Under assump- 
tions \l.l\ \1.2\ and \1.3\ if a{x, i, j) does not depend on x and the Markov process I has 
an invariant probability measure v verifying 



ieF 

then there exist a probability measure tt and some constants C, A, to > such that 

for every yo = (yoi jo) S E, where the distance d, on E, is defined by 

d(x, y) = U^, + U^,(l A dHx, y)), (1.4) 



for every x = {x,i), y = {y,j) belonging to E, xq is as in Assumption 17.31 and q G 
(0,1]. 

This statement is not surprising: it states that if the process contracts in mean, then 
it converges exponentially to an invariant distribution. The conditions are rather sharp 
as will be illustrated in Section|5] In particular, we recover IIBLMZ12cl Theorem 1. 10] 
and this (slight) generalisation could be deduced from the argument given there. Using 
Holder's inequality, we can also deduce convergence in the p"' Wasserstein distance 
IV'^*' with p > 1 provided that X satisfies a moment condition. We give the previous 
theorem and its proof for sake of completeness and for a better understanding of the 
more complicated case, where a is allowed to depend on its first argument. That is 

Theorem 1.5 (Wasserstein exponential ergodicity with an on-off type criterion). Let 
us suppose that Assumptions U .IWl .2\ and \1.3\ hold. We set 



Fo = {i e F I p{i) > 0} and Fi ^ {i e F \ p{i) < 0}, 
Pa = min p(i) > and pi = min p(i) < 0, 

qq = max sup > a(x,i,j) and ai = min inf > a{x,i,j). 

jeFi jeFo 



If 



then there exist a probability measure tt and some constants C, A, <o > such that 

m > to, Wd iSy,Pt,TT) < Ce-^\1 + Wd.{Sya,TT)), 

for every yo = (yo, Jo) € E, where the distance d, on E, is defined in ( 11.41 ), xo is as in 
Assumption U .3\ and a G (0, 1]. 

With this result, we not only recover IIBLMZ12cl Theorem 1.15], but we extend 
it significantly. In our case, the underlying dynamics are not necessarily deterministic 
and do not need to be strictly contracting in a Wasserstein distance. One drawback is 
that the constants A and C are much less explicit. This theorem is a direct consequence 
of the more general Theorem 13 .2 I below. These two theorems are our main result and, 
contrary to the previous theorem, it seems that they cannot be deduced directly from 
the approach of IIBLMZ12cl . 
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1.2 Two criteria with hypoellipticity assumption 

In the previous subsection, we have supposed that some of the underlying dynamics 
contract at sufficiently high rate in a Wasserstein distance. This is of course not a 
necessary condition for geometric ergodicity in general. Using some arguments of the 
proof of Theorem 1 1.4 1 and Theorem ll.51 we can deduce a different criterion which uses 
instead a Lyapunov-type argument to prove that X converges. We begin by stating an 
assumption similar to Assumption ll.3l 

Assumption 1.6 (Existence of a Lyapunov function). There exist K > 0, a function 
V > 0, and for every i £ F there exists X{i) G M such that 

yt >Oyxe E, PfV(a;) < e"^<''V(x) + K. (1.5) 

Note again that we have not supposed that X{i) > 0. One way to prove this kind 
of bound is to use the classical drift condition on the generator (see ( 12.2b below). With 
this assumption we are able to prove 

Theorem 1.7 (Exponential ergodicity in the constant case). Suppose that assumptions 
\1.1\ \1.2\ and \1.6\ hold, that a(x, i, j) does not depend on x and that I has an invariant 
probability measure v verifying 

i^(*)A(j) > . 

ieF 

If there exists iq £ F such that the sublevel sets ofV are small for P^'"*, then there 
exist a probability measure tt and two constants C, A > such that 

dxv ((SxPt,7r) < Ce-^\1 + Vix)), 

for every x = (x, i) G E. 

The definition of a small set is recalled in Definition |2.9l We give also the analogous 
of Theorem ll.51 

Theorem 1.8 (Exponential ergodicity with an on-off type criterion). Let us suppose 
that Assumptions U .IWl .2\\1 .3\ hold. We set 

Po = e P I > 0} and Pi = {i e P | X(i) < 0}, 

Aq = min X(i) > and Ai = min X(i) < 0, 

ieFo iSFi 

flo = max sup } a(x,i,j) and ai = min inf > a(x,i,j). 

If 

Xqgi + Aiflo > 0, 

and there exists zq G P such that the sublevel sets ofV are small for pI^°\ then there 
exist a probability measure tt and two constants C, A > such that 

dxv (<5xPt,7r) < Ce-^\l + V{x)), 

for every x = (x, i) G E. 
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Note that in general it is not necessary to assume that sublevel sets of V are small 
for any single one of the underlying dynamics. For example, using the results of IIBH12I 
IBLMZ12bl. Section l4~2l gives results analogous to the two previous theorems, in the 
special case of piecewise deterministic Markov processes where the only small sets for 
the underlying dynamics consist of single points. 

The remainder of the paper is organised as follows. The proofs of our four main 
theorems are split over two sections: Section |2] deals with the proof of Theorem II. 41 
and Theorem 11.71 In Section [3] we begin by giving a more general assumption in 
the non-constant case than our on-off criterion. Then, we introduce a weak form of 
Harris' Theorem that we will use to prove Theorem 11.51 The proof of this theorem 
is then decomposed in such a way to verify each point of the weak Harris' Theorem. 
Section 14.11 gives sufficient conditions to verify our main assumption in the special 
case of diffusion processes. The section which follows deals with the special case of 
switching dynamical system. We conclude with Section |5] where we give some very 
simple examples illustrating the sharpness of our conditions. 

2 Constant jump rates 

In this section, we begin by proving that under Assumptions II. 3l or 1 1.61 the process X 
cannot wander off to infinity, i.e. its semigroup possesses a Lyapunov function. We 
then prove Theorems 1 1 .41 and 1 1 .7l using a similar argument to llBLMZ12cl for the first 
one and Harris' Theorem for the second one. 

2.1 Construction of a Lyapunov function 

We begin by recalling the definition of a Lyapunov function 

Definition 2.1 (Lyapunov function). A Lyapunov function for a Markov semigroup 
{Pt)t>o over a Polish space (X,dx) is a function V : X i-^ [0, oo] such that V is 
integrable with respect to Ptix, ■)for every x & X and t > Q and such that there exist 
constants Cy, 7, Ky > verifying 

PtVix)^ [ V(y)Ptix,dy)<Cve-^'Vix) + Kv, (2.1) 
Jx 

for every x £ X and t > 0. 

A well know sufficient condition for finding a Lyapunov function is the following 
drift condition: 

CV < - jV + C, (2.2) 

where C is the generator of the semigroup (Pt)t>o- The condition (I2.2l i implies a bound 
like ( 11.5b and is clearly stronger than (12. It . In general, our switching Markov process 
X may not verify the drift condition (I2.2l i but, in Lemmas |2 . 7 1 and l378l we give a sharp 
condition under which it verifies (12. U . In this section, we first prove that a Wasserstein 
contraction as in Assumption ll.3l implies the existence of a Lyapunov-type function as 
in Assumption ll.6l Then, we will prove that Assumption ll.6l implies the existence of a 
Lyapunov function for X. 

Lemma 2.2 (Wasserstein contraction implies the existence of a Lyapunov-type func- 
tion). Let {Pt)t>Q be the semigroup of a Markov process, on a Polish space {X, dx), 
such that there exists A e R* verifying 



WdASxPuSyPt) < e~^'dx{x,y), 



(2.3) 
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for every x,y G X and t > 0. If there exist Xq € E and tx„ > such that the function 
Vx„ '■ X H> d(x, xq) verifies 

sup PtVx„{xo) < +00, (2.4) 

then there exist Ci , C2 > such that 

PtV^oix) < e-^\V,Jx) + Ci) + C2, (2.5) 
for every x £ X and t > 0. 

Proof. For any t > t^o and n > 0, it follows from ( 12.3b that 

ji-i 

k=0 

e-^* - 1 

Taking n = L^/^^oJ + 1' where [t/t^oJ is the integer part of t/tx„, we conclude that 



g-A« _ i| ' 



ftl^.o(a;o) < (e-^* + 1)C', C = sup 



which is finite by ( I2.4l i. Finally, for every a; S X and t > 0, we have 

Ptl4o(a:) = mASxPt,S,J < WdjS,Pt,Sx,Pt) + WdA^Pt^K) 
<e-^*14,(a;) + (e-^* + l)C", 

thus concluding the proof. □ 

We deduce that Assumption ! 1.3l implies Assumption ! 1.61 with V = T4„ and X = p. 

Remark 2.3. The point of this lemma is to also allow for negative values of X. When 
A > 0, then it is immediate that Pt admits a unique invariant measure and exhibits 
geometric ergodicity. 

Remark 2.4. IfVxg is in the domain of the generator C of (Pt)t>o then we have 

yt > 0, PtVx„{xo) < .^-^l-l-P^^^Vx„(xo), 
for some n > I. Now, taking the limit n — > +00, we deduce the following bound: 

mAKPt,Sx,) < ^^^cv(xo). 

Finally, for every x € X, we have 

PtVix) = WdASxPt,SxJ < WdA&xPu5x,Pt) + WdASxoPuK) 

g-At _ 1 

< e-^W(x) + ^ — CV(xo). 

— A 

However, Vxq does not belong to the domain of the generator in general, as can be seen 
already in the example of simple Brownian motion. 
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Remark 2.5 (The special case A — 0). In the previous leininUy we huve supposed that 
A 7^ 0, and this assumption is necessary for our conclusion to hold. Indeed, if (Bt)t>o 
is a Brownian motion then 

lim E[|Bt|] = +0O, 
and inequality (12.5b does not hold. 

We now show that if Assumption 11.61 holds and the mean of {X{i))i(zp is positive, 
then X admits a Lyapunov function. As in IIBLMZ12cl . this result comes from the 
following lemma: 

Lemma 2.6. Let (Kt) be a continuous time Markov chain on a finite set S, and assume 
that it is irreducible and positive recurrent with invariant measure vk- If ch: S ^ Mis 
a function verifying 



nes 



VKinyain) > 0, 



then there exist C,c,ri > and p S (0, 1] such that 



ce 



-rit < E 



^ J*pa(K,)ds 



< Ce""*. 



Proof. It is a consequence of Perron-Frobenius Theorem and the study of eigenvalues. 
See IBGMIOI Proposition 4. 1 ] and BGMIOI Proposition 4.2] for further details. □ 

Now we are able to prove that P possesses a Lyapunov function in the case where 
the switching rates do not depend on the location of the process. 



Lemma 2.7. Under Assumution U .l\\L2\ and \1.6\ ifa(x, does not depend on x and 
I has an invariant measure v satisfying 



ieF 



X(i)v(i) > 0, 



then there exist C'v, Kvi > and q G (0, 1] such that 

Vt > 0, Vx e E, PtVHx, i) < Cve-^^'^VHx) + Ky 

In the previous lemma, we used a slight abuse of notation. Indeed, if / is a function 
defined on E, we also denote by / the mapping (x, i) f(x) on E. 

Proof. First, Jensen's inequality gives this weaker form of (11.5) : 



P^'\V'')(x) < e 



-qX{i)t 



for every q e (0, 1]. Now, for alH > and (x, i) G E, a straightforward recurrence 
gives 



PtVHx,i) = E 

< E 



t — JiVv- J-N^- — 



° • • • o p!^^%,(v^)(x) 



-f^qX(Is)ds 



V^ix) 



n>0 



where {Tk)k>o is the sequence of jump times of /, with Tq = 0, and Nt the number of 
jumps before t. By Lemma l276l there exist C > 0, 77 > and q G (0, 1] such that 



E 



< Ce""*. 
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Furthermore, one can show that T„ is of order n and that 

Ti>0 n>0 

for some £ > 0. We do not detail this argument now, but we will prove it in the 
slightly more difficult context of non-constant rate a in Lemma lXSl This concludes the 
proof. □ 

Remark 2.8 (On the assumption that F is finite). It is natural to extend our results to 
the case where F is countably infinite. Obviously, we then have to add the assumption 
that I is positive recurrent, but this is not enough. Indeed, if for each i G F, Ci(i) and 
C2(i) denote the constants Ci, C2, appearing in Lemma |Z21 applied on Z**', then we 
should furthermore assume that 

sup(Ci(i) + C2(i)) < +00 , 

ieF 

for the argument to go through. 
2.2 Proof of TheoremO 

This section is split into two parts. We begin by introducing our coupling construc- 



tion, and we then proceed to prove Theorem 1 1.41 In both parts, we make the standing 
assumption that the hypotheses of Theorem 11.41 hold. In particular, / is an ergodic 
Markov chain. 



2.2.1 Our coupling 

Let X = (a;, i) and y = (y, j) be two points of E, we will build a coupling (X, Y), 
starting from (x, y), such that each component is an instance of the Markov process 
generated by L, and such that 



Vt > 0, E 



d(Xt,Yt) 



<Ce-"*d(x,y), 



for some C, a > 0. Here the "distance function" d is defined by 

d(x, y) = ^{U^jdix, A 1 + l,^j)(l + d{x, xo)'i + d{y, xqY) . 

Here, we put "distance function" in quotation marks since d does not in general satisfy 
the triangle inequality. Obviously, we have 



d(x,y) < d(x,y) < l^^^^l + d{x,Xo)'' + d(y,xo)i. 

Remark that it is well-known that if / and J are two independent processes with transi- 
tion rate a then there exists 0c > such that 

Vi > 0, ¥{Tc >i)< e-^"*, 

where Tc = mi{t > \ It ~ Jt} is their first meeting time. From now on, we fix the 
starting points of our coupling x ~ (x, i), y — (y, j) and the time t > 0. The processes 
(Xt)t>o = {Xt, It)t>o and (Yt)t>o = (Yt, Jt)t>o are then coupled as follow; 
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• if i ^ j then we consider that X and Y evolve independently of each other until 
the first meeting time Tc- 

• for all s > Tc, we set Is = Jg and we couple X and Y in such a way that 

Vfc > 0, E [d{Xs,,YsJ I J-5,_J < e-''<^--i>^^^-^'=-'d(X5,_,,rs,_,), 

where {Tk)k>o is the sequence of jumps times of /, Sk = Tk At and (J^s)s>o is 
the natural filtration associated to (X, Y). 

Note that if i = j then Tc = 0. 

Proof of Theorem U .4\ By Lemma l2!2l it suffices to show that 

E [d(Xt, Yt)] < Ce-^*(1 + d«(xo, 2/o)) • 
If « = j, then by Jensen's inequality and iteration, we have similarly to before 



E[d(Xt,yt)9] <E 

where q £ (0, 1]. By Lemma IZ61 there exist C, 77 > and q e (0, 1] such that 
Now, for general i and j, we have 



;[d(Xt,Yt)] < E yiT^>t/2 (1 + VHXt) + VHYt)) 



■E 



lT,<t/2 d(Xt, Ytni + Vi(Xt) + Vi(Yt)) 



where V{x) = d{x, xo). Now, Cauchy-Schwarz inequality. Lemma IZ21 and Lemma IZTl 
give 

E yiT^>t/2a + V%Xt) + VHYt))\ <r{Tc> tl2fl^ E [1 + V^iXt) + V^Ytf'' 

< e-^=*/^ (1 + Cve'^'UVHx) + Viy)) + 2Kvf''^ . 
In the other hand, one has the bound 



E 



lT^<t/2d{Xt, Yt)i{l + V%Xt) + VHYt)) 



< E [lT^<t/2d{Xu Ytr] E [1 + V^iXt) + V'iYtt^^ . 
As a consequence of Lemmas 12. 2l and l2. 71 we also have the bound 

E [lT^<t/2d{Xt,YtrY^^ < Ce-"*/2E [d{XT^,YTjnT^<t/2Y^^ 

< Ce-'"'/'-E [{ViXTj'^ + V(YTj'^)lT^<t/2] ^'^ 



(2.6) 



Assembling these inequalities and using again Lemma |Z7] to bound the second factor 
in ( 12.6b . the claim follows. □ 
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2.3 Proof of TheoremlTTl 

We divide again the proof in two parts. First, we recall some tools on Harris' Theorem. 
Second, we give the proof of Theorem ll.7l 

2.3.1 Classical Harris' Theorem 

Here, we recall a version of Harris' Theorem (also called Foster, Lyapunov, Meyn- 
Tweedie, Doeblin) that is suitable for our needs. This theorem yields exponential con- 
vergence to stationarity for a process which does not "escape to infinity" and verifies 
furthermore a Doeblin-type condition. More precisely, we use the following notion of 
a small set: 

Definition 2.9. A set A G X is small for the semigroup (Pt)t>a over a Polish space 
{X, dx), if there exists a time t > Q and a constant £ > such that 

dTyid^Pt,SyPt)<l-e 

for every x,y € A. 

The classical Harris theorem OHMl 1||MT93I then states that 

Theorem 2.10 (Harris). Let (Pt)t>o be a Markov semigroup over a Polish space 
(X, dx) such that there exists a Lyapunov function V with the additional property that 
the sublevel sets {x & X \ V(x) < C} are small for every C > 0. Then {Pt)t>o has a 
unique invariant measure tt and 

djyiS^Pt,?:) < Ce-'-'il + V(x)). 

for some positive constants C and 7*. 

Note that one does not really need that all sublevel sets are small and one can have 
a slightly stronger conclusion by using a total variation distance weighted by V. 

Proof of Theorem \L7\ By Lemma lZTl P admits V as Lyapunov function so, by Harris' 
Theorem, it only remains to show that {V < C} is small for P, for every C > 0. 
Since y is a Lyapunov function, there exists t^^^ > and K > Ky (with Ky as in 
Lemma 1221 such that 

yt>t['\ E[V(Xt)]<K, 

uniformly over all x £ E such that V{x) < C. Therefore, if X is a processes generated 
by L, it follows from Markov's inequality that 

V{V(Xt)<2K)>^, 

uniformly over t > t^^\ 

Let now io E F he as in the statement. Since A = {V < 2K} is small for 
we obtain some > and e > 0, such that for all x', y G A there exists a couphng 
(Z;°''^,Zt'°'^) verifying 

P (^Zl"-"" =^ Zl°'y^ >e, t>to , (2.7) 

and have respective law ^P/'"', SyP^'^K 

By the irreducibility of the process /, one can find > t^^^ and 6 > such that 
P(/s = iojVs € + <o]) > S, uniformly over the starting distributions. Let now 

(Xt, Yt) be the following coupling: 



Non-constant jump rates 



13 



• the Markov chains / and J are independent over t G [0, + to]; 

• the processes X and Y are independent over t G [0, t*]; 

• conditionally on the set 

B = {V(XtJ < 2K,V{YO < 2K,I, = J, = zo,Vs e [U,U+to]}, 
the processes X and Y are coupled in such a way to verify (12. 7K over t e [t* , t* + 

to]; 

• conditionally on S"^, they are coupled independently from each other 
The Markov property gives 

nV(Xt,) < 2K, /, = ^o, Vs G + to]) > ^ , 

andsoP(i3) > 5^/4. Combining this inegualitv with ( 12.7b . we conclude that P(Xf = 
Yt,+to) > uniformly over all initial conditions x and y with V(x) < C and 

Viy) < C, as required. □ 

3 Non-constant jump rates 

In all of this section, we now assume that a depends non-trivially on its first component, 
so that / by itself is not a Markov process anymore. We want to use again Lemma IZ61 
to show that X converges, but this time we cannot use it directly on /. The idea is 
to consider an auxiUary process which does not depend to X and which will bound 
{p(It))t>o or (A(/f))t>o. More precisely, we will assume 

Assumption 3.1 (Birth-death type criterion in the non constant case). There exist n g 
N and a partition (Fn)o<n<7i of F such that 

\fn <n,\fie Fn, V? <^ F„_i U F„ U F„,+u Vx G E, a{x,i,j) = 0. 

Let {Lt)t>o be the continuous time Markov chain with generator 

Gf(n) = bin) {fin + 1) - fin)) + din) (fin - 1) - fin)) , (3.1) 

for every n < n, where 

bin) = inf inf } aix,i,j)>0, 
xeEieFn ^-^ 

and 

din) = sup sup y. o,ix,i,j) > 0, 

ifn 7^ and diO) = 0. This process is irreducible, non-explosive and positive recurrent 
with invariant measure v. 

If this assumption holds then, for every i d F,ws denote by ni the only n < n 
verifying i G Fn- Let us recall that v is defined, for every n < n,hy 

^in) = HQ) f[ ^%Tr^ and m = (1 + S)-^ , 

k=l ^ ' 
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where 

_ _ A 6(0) ...b(n-l) 

Now we can state two slight generaUsations of Theorem 11.51 and ll. 81 The first one is 

Theorem 3.2 (Wasserstein exponential ergodicity). Let us suppose that Assumptions 
O \L2\ \T3\ and\3l\hold. If 

n 

iy(n)a(n) > 0, 

where (a{n))n>o is an increasing sequence verifying a{n) < iuii^p^ p(i), then there 
exist a probability measure tv and some constants C, A, <o > and q € (0,1] such that 

Vt > to, Wd (<^y„Pt, tt) < Ce-^*(1 + WMSy.^Tv)), 

for every yo — (yoi io) G E, where the distance d, on E, is defined in ( 1 1 .41 ), xq is as in 
Assumvtion U .3\ 

If Assumption 13.11 holds with n = then all contraction parameters are positive 
and we recover IIBLMZ12cl Theorem 1.15]. If it holds with n = 1, then we have the 
on-off criterion which was given in introduction. We can also state the analogous result 
in the setting of Theorem ll.81 

Theorem 3.3 (Exponential ergodicity). Let us suppose that Assumptions \Ll\ \L2\ 17.31 
and \3.1\ hold and there exists iq € F such that the sublevel sets ofV are small for PI^°\ 
If 

n 

v(n)a(n) > 0, 

n=0 

where (a(n))„>o is an increasing sequence verifying ain) < infigi?^ X(i), then there 
exist a probability measure tt and two constants C, A > such that 

dxv (<SxPt,7r)<Ce-^*(l + y(a;)) 

for every x = (x, z) S E. 

We do not give the proofs of Theorem I 1.8 l and Theorem l3.3l as their proofs are very 
similar to the proof of Theorem 1 1.71 combined with the argument of Lemma [378] below. 
To prove Theorem 13.21 however, we cannot use classical Harris' Theorem. Its proof 
follows the same idea as the proof of Theorem 11.41 but there is no direct equivalent 
to the meeting time. Instead, we use a weak version of Harris' Theorem which yields 
geometric ergodicity under the existence of a Lyapunov function and a modified "small 
set" condition. This theorem was previously applied to the stochastic Navier-Stokes 
equation IIHM08L stochastic delay differential equations IHMSl IL and linear response 
theory OHM 1 01 . It is an extension of the classic Harris' Theorem which allows to deal 
with some degenerate examples like the one given in (II. 11 1. 

3.1 Weak form of Harris' Theorem 

As already mentioned earlier, there are situations in which we cannot expect conver- 
gence in total variation. The problem here is that bounded sets may not be small sets. 
We will therefore replace the notion of small set by the following notion of "closedness" 
between transition probabilities introduced in HHMSl H . which takes into account the 
topology of the underlying space X. 
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Definition 3.4 (d-small set). Let P be a Markov operator over a Polish space X en- 
dowed with a distance dx : X x X i-^ [0,1]. A set A d X is said to be dx -small if 
there exists a constant e such that 

mASxP,SyP)<l-e, 

for every x,y £ A. 

This notion is a generalisation of the notion of small set, since small sets are d- 
small for the trivial distance. This definition can also be extended to situations when 
d is not a distance IHMSl 111 . As remarked in that paper, having a Lyapunov function 
V with d-small sublevel sets cannot be sufficient to imply the ergodicity of a Markov 
semigroup. To obtain some convergence result, we further impose that d is contracting 
for our semigroup: 

Definition 3.5 (d-contracting operator). Let P be a Markov operator over a Polish 
space X endowed with a distance dx '■ X x X t-^ [0, 1]. The distance dx is said to be 
contracting for P if there exists a < 1 such that the bound 

WdA^^P,5yP)<adx{x,y) 

holds for every x,y £ X verifying d{x, y) < 1. 

Note that this condition alone is not sufficient to guarantee the convergence of tran- 
sition probabilities toward a unique invariant measure since we only impose a contrac- 
tion when d{x, y) < 1. In typical situations, "most" pairs (x, y) may satisfy d{x, y) = 1, 
as would be the case for the total variation distance. However, when combined with the 
existence of a Lyapunov function V that has d-small sublevel sets, it gives geometrical 
ergodicity OHMS 111 Theorem 4.7]: 

Tfieorem 3.6 (Weak form of Harris' Theorem). Let {Pt)t>o be a Markov semigroup 
over a Polish space X admitting a continuous Lyapunov function V. Assume further- 
more that there exist t* > > and a distance dx : X x X i-> [0, 1] which is 
contracting for Pt and such that the sublevel set {x £ X \ V{x) < -iKy} is dx-small 
for Pt, for every t € [t^, t*]. Here Ky is as in definition \2.1\ Then, {Pt)t>o has an 
invariant probability measure n. Furthermore, defining 

Sx(x,y) = Vdxix, y)(l + V{x) + V(y)), 

there exist r > and to > such that 

yt > to, WsAl^PuvPt) < e-'^^WsJp,''), 

for all of probability measures /i, i> on X. 

Remark 3.7 (On the contracting distances). The main difficulty when applying the 
previous theorem is to find a contracting distance. The construction of this distance 
represents the main part of our paper. In l\HM10]l , there is a general way to build a 
contracting distance of a Markov operator P over a Banach space (B, || • ||), based on a 
gradient estimate for P and the existence of a super-Lyapunov function. This technique 
was efficient in HSMWWEMMS- 
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3.2 Construction of a Lyapunov function 

As in the constant case, we first show that if each underlying Markov process verifies 
a weaker form of the drift condition (12.2) then X possesses a Lyapunov function: 

Lemma 3.8 (Construction of a Lyapunov function). Let us suppose that Assumptions 
\l.l\\1.2\\L^ andUl\hold. if 

u(n)a{n) > 0, 

?i>0 

where (a(n))„>o is an increasing sequence verifying a(n) < inf X{i), then there 
exist Cv,Kv, A > and q € (0, 1) such that 

yt > 0, V(a;, i) G E, PtV^ix, i) < Cye"^^ V«(a;) + Ky- 

Proof. Recall again that Jensen's inequality gives this weaker form of il.5h 

PI'\V''){x) < e"«"<**V(x) + K", 

for every x E E and q e (0, 1]. Now, we will describe a construction of X which will 
permit to have a better control of the jump mechanism. Let r > 2a and {Nt)t>o be a 
Poisson process of intensity r; namely 

n>0 

where t,i = ^22=1 and {Ek}k>o is a family of i.i.d. exponentially distributed ran- 
dom variables with mean 1/r. We set tq = 0. At this stage, we do not fix the value of 
r, but we allow ourselves the freedom to tune it at the end of the proof. We will couple 
X = {X, I) with a process L that has generator (13.1) . Let us fix rt G N, on [r„, t„+i], 
the process (X, L) is built as follow: 

• conditionally on Xr^, {Ls)s>o, {Tk)k>o, the process (Xs)s&[r„,T„+i] moves as 
{Z^^_^^^)t£[r„^Tn^i] Starting from Xr„; more precisely, 

E [f{Xt)ltelr^^r„ + ,] I Gn] = P^'i^j/C^rJ, 

where / is a continuous and bounded function and Gn = f {Xr„ , (Ls)s>o, (Tk)k>o}'^ 

• on [t„, t„+i), the discrete processes / and L remain constant; 

• at time t„+i, we consider a Bernouilli random variable B with parameter 1/2 
independent the previous variables and we have two situations: 

- if nj^^ 7^ then 

* if B = then / does not jump but L can jump, 

* if B = 1 then L does not jump but / can jump; 

- if nj^^ ~ Lr„ then 

* if i? = then neither / nor L can jump, 

* if i? = 1 then we have the following possibiUties: 

• -^T„+i = Lr„ + 1 and Ir^^^ G ^"/^„+i' 
' -^T„+i = Lt„ and /r„+i G Fni^^ U F^^^+i, 
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• -^T„ + i = Lr„ - 1 and Ir^^^ e Fn,^^ U Fnj^^ -1. 

Here, the respective probabilities of those jumps that are admissible are chosen in such 
a way that X and L takes separately are indeed Markov processes with respective gen- 
erators L and G. 

In words, if L 7^ 717, L and X move independently from each other until the time 
where ni and L agree. After that time, it is guaranteed that one always has n/ > L. 
We have not detailed precisely where / jumps exactly to be concise. But, if we ignore 
A^, the couple (X, L) is just the Markov process generated by 

gf{x,i,i) = 6'^f{x,i,i) 

+ ^ a{x, i, j) {fix, j, I) - fix, i, I)) 

+ 6(0 {fil + 1) - /(O) + diV) (fU - 1) - /(/)) , 

if I ^ Hi and 

Gfix, i, Hi) = C'-^^fix, i, Ui) 

+ X! o.(^''i'd){fix,j,n.i-l)-fix,i,ni)) 

+ {dini)~ ^ aix,i,j)j{fix,i,ni-\)-fix,i,ni)) 



^ aix, i, j) {fix, j, m) - fix, i, Ui)) 



biui) 



E 



^ aix, {fix, j, m + l)- fix,i, m)) 



E 



aix, i, k) 



^ aix, {fix, j. Hi) - fix, i, Ui)) 



Now, for alH > and x £ E,we have 



PtVHx) 



< 



t-TjV, 



< E 



ViXr^J + K 



K 



< E 



= - fd qa(Ls)ds 



Vix) + K^E\e 

n>0 



-qfg" a(L,)ds 



(3.2) 



Now, using Lemma 12761 there exist C,r] > and q G (0, 1] such that 



(3.3) 



Hence, it only remains to prove that 



n>0 



< +00. 
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We cannot deduce this directly from equation (13.3) but, heuristic ally, if r and n are 
large enough then the law of large number gives that t„ « n/r and thus 



E 



-/J-" qa{h)ds 



< C'e"''"/'-, 



and the previous sum is finite. Now, we estimate the left hand side of the previous 
equation following that t„ is lower than n/r, close to njr or higher than njr. Note 
that we have not 

because a can be negative. Let e > and denote by q the worst case of decay: 

Q = - min{ qa{k) \ k ^F^. (3.4) 
If T„ is close to n/r then we have from ( I3.3l l the bound 



E 



nr-i(l-e),m--i(l+e)]} 



< e2''^"/''E 

< e2''^"/''E 



^ /o 



e " ''"""l{T„e[Tir-i(l-e),n'--Ml+e)]} 



Thereafter, we therefore fix e < rj{2g + if) ^ . Now, if t„ is lower than n/r then, using 
Markov's inequality, we have 



E 



- /J"" (jQ(7s)ds J 



{T„<nr-i(l-e)} 



< e 



[Tn < nr 



< genr-\l-e)ge„r-i(l-.)]g [g- 




< exp —n In 1 



for every 6 > 0. And finally, if t„ is higher than n/r then, using the Cauchy-Schwarz 
and Markov inequalities, we have 



E 



>nr-i(l+e)} 



< cxp ( — — I In ( 1 



In I 1 - - 



where 9' > 0. Note that in the previous inequality, we have supposed that r > 2g. Let 
7 € (0, 1), we set 6 = 6' = -fr. We can find a large r and a small 7 verifying 



and 



In (1 + 7) - (1 - e)7 + (1 - e)r- V > 0, 
2g' 



In 1 - 



+ ln(l-7)+7(l + e)>0. 



and thus there exist C" > and e > such that 



e ■'0 



/J"" qa(h}ds 



< 



e ^" < +00 , 



thus concluding the proof by combining this with (I3.2l l and ( 13. 3I ). 



□ 
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Remark 3.9. If all Markov processes contract, then the proof simplifies considerably. 
Indeed, if for all i ^ F, one has a{i) > C > 0> then one has 

^ E [e- < ^ E [e-C^"] < E (^Tc) " = 7 < - ■ 

n>l n>l n>l ^ ^ ^ ^ 

3.3 The contracting distance 

This section is divided in three parts. We introduce the distance d that we will use in 
Theorem 13. 61 we build our coupling in such a way that d will be contracting for it, and 
we finally prove that it is indeed contracting. 

3.3.1 Definition of d 

Here, we build a distance d: {E x F) x {E x F) ^ [0, 1] such that there exist > 
and a e (0, 1) verifying 

d(x, y)< 1 ^ yt>U, E[W^{5,Pu 5yPt)] < ad{x, y). (3.5) 

where x = (x, i) and y = (y, j) belong to E x F. Since we can say nothing when i ^ j, 
we will take d{x, y) constant equal to 1 in this case. When i ^ j we want to use As- 
sumption ll.3l to prove a decay. But it is more useful to "decrease the contraction" of the 
underlying Markov semigroup. More precisely, by Jensen inequality. Assumption [O] 
gives 

for all t > 0, q G (0, 1] and every probability measures /i, u. Finally, we define d by 
d(x, y) = U^, + U=j {5~^d\x, y) A 1) , 

where 5 > will be determined later Now, if a coupling (Xt , Yt)t>o = {{Xt , It), {Yt, Jt))t>o 
starting from (x, y), verifies d{x, y) < 1, then Iq = Jq = i = j. So, we will try to build 
our coupling in such a way that / and J remain equal for as long as possible. More 
precisely, if we set 

T = inf{s > I ^ Js}, (3.6) 
then we will prove that there exists A' > and a choice of coupling such that 

P(T < oo) < Kd{x,y). 

3.3.2 Construction of our coupling 

Here, we fix x = (x, i), y = (y, j) in E and we let t > 0. Let r > and {Nt)t>o 
be a Poisson process of intensity r with Nt = J2n>o ^{^-,^<t} ^^'^ '^n = Sa:=i -^^ for 
a family {Ek)k>o of i.i.d. exponential variables as before and tq = 0. We assume 
that r > 2a, i.e. that is r is bigger than the jump rates of / or J. As in the proof of 
Lemma 1378] and Theorem I L4I we give the construction of our coupling (X, Y) at the 
jump times of N. Let 7i £ {0, .., Nt}, we consider the following dynamics: 

• If It^ 7^ then Xg and Yg evolve independently for every s E [t„, Tn+i A t). 

• If Ir„ = Jt-^ then by Assumption ! L 31 we can couple X and Y in such a way that 

E [d{Xr„^,^t,Yr„^,^t) I Gr^] < e-''(^->(""+i^*-"">d(X,„ , F^J, 
where = cr{(X^„, Y^„), (Tfc)fc>o}. 
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At the jump times of the situation is different since I ox J may jump. We will 
optimise the chance that / and J jump simultaneously. For each n G N*, we cut [0, 1] 
in four parts 1^ , /" , /J , /g in such a way that 

^ jeF 

A(/2 ) ^-Y] a{Xr„-, Ir„ , J) A a(y,„_ , , j), 
r — ' 

jeF 

MIS) = 1 - J E «(^r„- , /r„ , J) V ^ a(r,,._ , , J), 
^ j&F jeF 

where A is the Lebesgue measure and (.t)+ ~ max(a;, 0). Let (J7„)„>o be a sequence 
of i.i.d. random variables uniformly distributed on [0, 1], we couple / and J at the jump 
times as follows: 

• For Un G Iq, / jumps, but J does not jump. 

• For Un G J jumps, but / does not jump. 

• For Un G 1 and J both jump simultaneously to the same location. 

• For Un G I3, 1 and J both stay in place. 

The second components, X and Y, do not jump. Finally, we also couple X and Y with 
a continuous Markov chain L which only depend to U and N and which verifies 

yt > 0, p{It) > a{Lt). 

This Markov chain L is constructed as in section |378] 

Remark 3.10. This coupling is not quite Markovian since, between times Tn and Tn+i, 
it already uses information about the pair (Xt, Yt) at time Tn+i- However, in many 
situations to which our results apply there exists a Markovian coupling with generator 
L**^ which minimises the Wasserstein distance for each of the underlying processes. In 
this case, we can make our coupling Markovian with generator 

L/(x, y, n) = L*"/(x, y, ?i) + ^ {a{x, i, k) - a{y, j, k))^ f{{x, k), y, n + 1) 

keF 

keF 

+ 5Z ^ "'^y^ ^)f^^^' (2^' " + 1) 

keF 

+ (r - ^a(x,i,k)\/ a{y,j,k)jf{x,y,n+l)-rf(x,y,n). 

keF 
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3.3.3 The distance d is contracting for P 



In this subsection, we show that the distance d defined above is indeed contracting for 
the coupling constructed in the previous subsection. This is formulated in the following 
result. 

Lemma 3.11. Let (Xf, Yf)t>o be the coupling of the previous section. Under the as- 
sumptions ofTheorem \3.2\ we can choose r and 6 in such a way that 



Vt > U, E 



d{Xt,Yt) 



< jd{x,y), 



for some 7 € (0, 1) and > 0, and allx,y G E x F verifying d{x, y) < 1. 

Proof. Recall that since d(x, y) < 1 one has Iq = Jo and that T, defined in ( 13.6b . 



denotes the first time of separation of / and J. Using Lemma lZ6l there exist q e (0, 1] 
and C, 77 > such that 



E 



diXuYt) 



< E 



l{T=oo} -gd'^i^t, Yt) + l{T< + oo} 



C[d«(a;, y)] + P (T < +00) 



< Ce"''*d(x, y) + P (T < +00) 
Here, we have used the fact that 

E[l{T=oo}d\Xt,Yt)] <E 
< E 



< E 

< E 



l{T>r«je 

1 



E[dHXr.^,Yr,J\gn] 



E[dHx,y)] 



It remains to obtain a bound on P (T < +00). Since / and J can only jump when N 
jumps, T can be finite only if it is one of the jump times of A^. So, we set 

A,, = {T = T„} = {T > T„ and /,„ ^ J,J. 

By Assumption ll.il we have 

P (An) = P ({[/„ e u /r u IS} n{T> Tn}) 



< E 



< E 



21 {T>r„ }J2jeF\ 0-iXr„ - , ^r„ - , j) ^ a{Yr„ - , /r„ - , j) 



Hence 



j,q j,q 



' (T < 00) = ^ P (A„) < -^d(x, yf E 



-1 So" a(Ls)ds 



n>l 



n>l 
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Now, similarly to the proof of Lemma 13. 8 1 there exist C" > and e > verifying 



-1 !o" a(Ls)ds 



n>l 



n>l 



C < +00. 



Combining these bounds, we obtain the estimate 



E 



d{Xt,Yt) 



< Ce 



S d{x,y). 



First making 6 sufficiently small and then taking t large enough, we thus obtain the 
announced result. □ 

3.4 Bounded sets are d-small 

Here, we prove that if a set is bounded then it is d-small. 

Lemma 3.12. Under the assumptions of Theorem 13.21 if S d E x F is of bounded 
diameter in the sense that 

R = sup{d(x, y) I X, y e 5} < +oo, 

then there exist t<,,t* > such that S is d-smallfor Pt, for all t €z [t* , t*]. 

Proof. Let x = (x, i) and y = iy,j) be two different points of S. By Assumption 13. II 
there exists io G F such that p(io) > 0. Let (Xt)t>o and (Yt)(>o be two independent 
processes generated by (IL2b and starting respectively from x and y. Let us denote 

Tin inf {t > I /t = Jt = io} and Tout = inf {t > Tin \ h ^ io or J* ^ io} . 

For every 5, c > such that 6 > c, we define 

Pc,h(x, y) = P (Tin < C, Tout > b) . 

By Assumptions ! 1 . 1 l and l l.2[ we have J5c.h(x, y) > 0. Using the fact that a is bounded, a 
coupling argument shows that pc,b is lower bounded by a positive quantity which only 
depends on i and j. We then obtain the bound 



E 



diXt,Yt) 



< E 



1 - Pc,fc(x, y) 



l{ri„<c,To„,>6}^(Xt, Yt) 

< 1 -pc.bM (1 - riee^e-''<'«'*d(x,y)) 
<l-PcA^,y) (1 - riee^e-''<'«'*i?) , 

where o was defined in ( 13.41 ). There exist c > Oandi* > c such that l—5~^e^'^e~''**'''** i? > 
0. Since F is finite, we can furthermore bound pc,6 from below by the minimum over 
all i,j £ F, and the result follows for any b > t^, and t* G (t* , b). □ 

Remark 3.13. One can see from this proof that it is not necessary that the jump rates 
are lower bounded, as in Assumption U .2\ Indeed, we need that, for each i, j € F, the 
jump times of I are stochastically smaller than a variable which does not depend of the 
dynamics of X. 
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3.5 Proofs of Theorem [O] and Theorem ^ 

Lemma l2!2l and Lemma [378] give the existence of a Lyapunov function V, Lemma lS.llI 
shows that d is contracting for P, and Lemma [3.121 proves that sublevel sets of V are 
d-small. So we can use Theorem l3.6l to deduce that there exist a probabiUty measure tt 
and some constants C, A, to > such that 

Vt > to, Wg (/xPt, tt) < Ce-^*Wg (/X, tt) , 

for every probability measure fi on E. In this expression, d is defined by 

d(x, y) = ^{U^, + U=,(l A di(x, y)))(l + d^x, x„) + di{y, xq)), 

where x = (x, i), y = (y, j) belong to E, xq is as in Assumption ! 1.3l and q G (0, 1]. We 
conclude the proof by noting that d < d. 

4 Two special cases 

Here, we give some sufficient conditions allowing to verify our main assumptions in sit- 
uations where the underlying processes are deterministic or diffusive. Note that we can 
find sufficient conditions in IIClol2l for stochastically monotone processes, in OCJIOI 
for birth-death processes and in lEbel H for diffusion processes. 

4.1 The case of diffusion processes 

Let us recall that a diffusion process on R'^, d e N*, is a process generated by 

d d 

Vx e R", cm = b^imf(^') + E <T, A^)d^,jf{x), (4.1) 

where / is a smooth enough function and 6, a are regular enough, say 

Vx,yeR^ \\a(x)^a(y)\\ + \\b(x)-b(y)\\<K\\x-y\\. 
for some K > 0. 

Lemma 4.1. Let (Pt)t>o be the Markov semigroup generated by (14.1b . If 

yx,y {b(x)~b(y),x-y) < -a\\x-yf, 

for some a S K, then 

Vt > 0, w^\.^\(^lPt,lyPt) < e-"*W||.||(/i,i.), 

for any probability measures fi and v. 

Proof. It is usually proved considering the same Brownian motion for two different 
solutions of the SDE starting with different initial measures. □ 

Assumptions of Theorem [L7] or Theorem 1 1.8 1 are satisfied if one of the underlying 
diffusions verifies Hormander's hypoellipticity assumption. See for instance flHail II 
for an introduction on this subject. 
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Remark 4.2 (Exponential convergence for an infinite dimensional process). The pre- 
vious result gives also the convergence for switching Fokker-Planck processes. Indeed, 
we can consider that each underlying Markov process {Zf '')t>Q is deterministic, be- 
longs to the space of smooth density functions, and verifies 

d d 
k=l kd=l 

for all X e R'', and t > {). The previous lemma gives a contraction as in Assump- 
tion \1.3\ for each underlying process, where d is the Wasserstein metric. 

4.2 Case of piecewise deterministic Markov processes 

Let us assume that each one of the underlying Markov processes is actually deter- 
ministic. More precisely, we consider that = G**' • V/, for every i £ F, 
where (G**')i£ i? is a family of vector fields such that the ordinary differential equations 
x' = G^^\x) have a unique and global solution for any initial condition, for every i G F. 
Lemma 143] gives the assumption in order to apply Theorem II. 4 1 and Theorem [T3] In 
general, we can not apply Theorem [LT] or Theorem 13. 3 I but l,BH12.,BLMZ12bJ give a 
sufficient condition ensuring that X generates densities: 

Assumption 4.3 (Hormander-type bracket conditions). Let Qq = {G**' — G'^*, i ^ j} 
and for all fc > 0, 

^fe+i = {[G«,G]UeF, GeGk}, 
where [, ] designs the Lie bracket. We have Gk{x) — {G{x) \ G £ Gk} — 
In this case our main theorem gives 

Theorem 4.4. Let us suppose that Assumvtions \Ll\\L2\ and \4.3\ hold. If one of the two 

following assumptions is satisfied: 

• a(x,i,j) does not depend to x and I is ergodic with an invariant measure v 
satisfying 

> 0; 

• Assumption \3.1\ holds and 

v{i)a{i) > 0, 

iGF 

for some increasing sequence a satisfying a(n) < minigj?^ X{i), for all n < n. 
then there exist a probability measure tt and two constants G, A, to > such that 

yt > to, dxv (<5xPt,7r) < Ge-^*(1 + V(x)), 
for every x — (x', i) G E. 

Proof. Using IIBLMZI2bl Theorem 6.6], we see that compact sets are small for X. 
Using Lemma IZTl in the first case and Lemma [378] in the second case, we see that we 
can apply Theorem [2.101 □ 



5 Examples 

Here, we give three simple examples to illustrate our results. 



Examples 



25 



5.1 The most elementary example 

Let us consider the example where X belongs to R and verifies 

Vi > 0, dtXt = ItXt, 

where (It)t>Q is the continuous time Markov chain, on { — 1, 1}, which jumps from 
1 to —1 with rate ai > and from —1 to 1 with rate a_i > 0. If ai > a_i then 
Theorems 1 1 .41 and 1 1 .5 1 give the exponential ergodicity of X in the Wasserstein distance. 
Here, the invariant law is 

^0 ® ^- (a-i5-i + ai(5i), 

a_i + ai 

and there is clearly no convergence in total variation. Thus, classical Harris' Theorem 
does not work here. Furthermore, the classical law of large number gives 

{0 a.s. , if fli > a_i, 
+00 a.s. , if fli < a_i. 

In particular, there is no convergence when ai < a_i. 

Remark 5.1. In our main theorems, we use a Wasserstein distance associated to a 
distance comparable to d'^ rather than d. We choose this distance because, in general, 
moments of\ can explode even though X converges in law. For instance, in the above 
example, one has \\mt-¥oo ~ oo as soon as ai < 1. See also HBGMl&l for 
comments on the optimal choice of the parameter q. 

5.2 Wasserstein contraction of some switching dynamical systems 

Let us consider a slight generalisation of the previous example; that is X belongs to R 
and verifies 

Vi > 0, dtXt = ~a(It)Xt, (5.1) 

where {It)t>Q is a recurrent continuous time Markov chain on a finite state space F 
and a a function from F to K. Theorem ll. 41 gives the exponential- Wasserstein ergodic- 
ity under the condition that 

a(i)iy{i) > 0, 

where i/ is a invariant measure of /. This simple example satisfies a bound like in 
Assumption lL3l Indeed we have 

Lemma 5.2. Under assumption ( 15. Il l, there is a distance S onE such that the Wasser- 
stein curvature of the semigroup ofS. is positive, i.e. there exists A > such that 

yt > 0, Ws (<5xPt,5yPt) < e-^*<5(x,y), 

for all X, y e E. 

Proof. Firstly, let us give a complement on the conclusion of Lemma l2!6] The Markov 
chain / satisfies its assumptions and using the results of IIBGMIOI . there exist a func- 
tion ip on F, p > and p e (0, 1) verifying 



Vi > 0, E 



-p'E mo)] . 
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Now let S be the distance, on E, defined by 

Vx, y e E, ^(x, y) = - y\P + | + mivl" + D, 



where 



ip = maxipik) and ip = min?/'(fc). 

k£F — keF 



Now, using the fact that 



the proof is straightforward. 



□ 



5.3 Surprising blow-up under exponential ergodicity assumptions 

Here we gives some comments on IIBLMZ12al Example 1.4], which also illustrate the 
sharpness of our criteria. Let us consider E = R^, F = {0, 1}, = Ai ■ V/ where 



^0 



-1 
-1/3 



-1 



and Ai 



-1 



-1/3 
-1 



a{x, 0, 0) = a(x, 1, 1) = 0, and a(.T, 1, 0) = a{x, 0, 1) = a > 0, for all x G M^. In 
short, X is generated, for all x € and i G {0, 1}, by 



L/(a;, i) ^ A, • Vf(x, i) + a {f(x, 1 - i) ~ f{x, i)) . 



(5.2) 



Since a does not depend on its first component, / is a Markov process and it converges 
exponentially to 

For each i e {0, 1}, we have dtZf' — AiZ'^l^ and thus we easily prove that 



< e" 



Z, 



(I) 



for every i > 0, where the norms 



and 

II and 



< 3e 



1-i 

1 are defined by 



^0 



1-i 



(5.3) 



Vu, = (ui, U2) e MM|u||o = •y/(ui/3)2 + ul and ||m||i = ^ul+(u2/3)^. 

Thus each flow i E {0, 1} contracts, with the norm || • ||i, and converges geometrically, 
with the norm || • ||i-i, to the same limit. Nevertheless, if a is large enough then 
IIBLMZ12al Example 1.4] shows that 

lim llXtW = +0O. 



In particular, the conclusion of Theorem 1 1.41 is not satisfied. This illustrates the fact 
that assuming that each underlying dynamics converges geometrically is not sufficient 
in general to guarantee the convergence of X. Moreover, this shows that it is essential 
in Theorem ll.4l to measure the constants p{i) with respect to the same distance for every 
i. Note that the Wasserstein curvature of Z*^**, with respect to || • is negative and 
given by —74/6. 
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5.4 Non-convergence when / is recurrent but not positive recurrent 

A last example is the following: the process X verifies 

yt > 0, dXt = -{Xt - aijdt, 

where (a„)„>o is a bounded real sequence and / is an irreducible and recurrent con- 
tinuous time Markov chain which is not positive recurrent. It is easy to see that the 
sequence of laws of iXt)t>o is tight and we can hope that there exists a probability 
measure tt verifying 

lim E [f(Xt)] = / fdn, 

for every continuous and bounded function / and any starting distribution. But in 
general, this is false. To illustrate it, let us consider the case when I is the classical 
continuous-time random walk on N reflected at 0. Namely, / is generated by 

if i 7^ and 

J/(0) = /(I) - /(O). 
The sequence a on the other hand is defined recursively by: 

a„ if n ^ {2*^ I fc e N}, 



-a„ if n G {2'' I A: G N} 



In this case, the central limit theorem gives that /j w -y/t and so, for very large times, 
/ and a do not switch on the same time scale. As a matter of fact, the process a/^ 
stays constant during longer and longer stretches of time. It is then possible to find two 
sequences of deterministic times (i„)„>o and (s„)„>o, both converging to infinity, and 
such that 

lim E [fiXtJ] = /(O) and lim E [/(X,J] = /(I). 

Thus this process exhibits ageing and is not exponentially stable, even though there 
exists C > 0, such that for any two starting points x = {x,i) and y = (y, j), we have 



yt>0, Wd,(S,PuSyPt)<—\i-jl 



c_ 

VI' 



where do(x,y) = li=j\\x -y\\Al + U^j. 
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